Bayesian Bridging Topic Models for Classification
نویسنده
چکیده
We study the problem of constructing the topic-based model over different domains for text classification. In real-world applications, there are abundant unlabeled documents but sparse labeled documents. It is challenging to construct a reliable and adaptive model to classify a large amount of documents containing different domains. The classifiers trained from a source domain shall perform poorly for the test data in a target domain. Also, the trained model is vulnerable to the weakness of classification among ambiguous classes. In this study, we tackle the issues of domain mismatch and confusing classes and conduct the discriminative transfer learning for text classification. We propose a Bayesian bridging topic models (BTM) from a variety of labeled and unlabeled documents and perform the transfer learning for cross-domain text classification. A structural model is built and its parameters are estimated by maximizing the joint marginal likelihood of labeled and unlabeled data via a variational inference procedure. We also construct the discriminative learning on our proposed model for adjust parameters by using the minimum classification error criterion. We show that improvements over cross-domain text classification using the proposed model can be achieved better performance than other models.
منابع مشابه
A Validation Test Naive Bayesian Classification Algorithm and Probit Regression as Prediction Models for Managerial Overconfidence in Iran's Capital Market
Corporate directors are influenced by overconfidence, which is one of the personality traits of individuals; it may take irrational decisions that will have a significant impact on the company's performance in the long run. The purpose of this paper is to validate and compare the Naive Bayesian Classification algorithm and probit regression in the prediction of Management's overconfident at pre...
متن کاملBayesian Two-Sample Prediction with Progressively Type-II Censored Data for Some Lifetime Models
Prediction on the basis of censored data is very important topic in many fields including medical and engineering sciences. In this paper, based on progressive Type-II right censoring scheme, we will discuss Bayesian two-sample prediction. A general form for lifetime model including some well known and useful models such asWeibull and Pareto is considered for obtaining prediction bounds ...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Inf. Sci. Eng.
دوره 30 شماره
صفحات -
تاریخ انتشار 2014